WB-index: A sum-of-squares based index for cluster validity
نویسندگان
چکیده
Article history: Received 5 September 2012 Received in revised form 2 May 2014 Accepted 11 July 2014 Available online 17 July 2014 Determining the number of clusters is an important part of cluster validity that has been widely studied in cluster analysis. Sum-of-squares based indices show promising properties in terms of determining the number of clusters. However, knee point detection is often required because most indices show monotonicity with increasing number of clusters. Therefore, indices with a clear minimum or maximum value are preferred. The aim of this paper is to revisit a sum-ofsquares based index called the WB-index that has a minimum value as the determined number of clusters. We shed light on the relation between the WB-index and two popular indices which are the Calinski–Harabasz and the Xu-index. According to a theoretical comparison, the Calinski–Harabasz index is shown to be affected by the data size and level of data overlap. The Xu-index is close to the WB-index theoretically, however, it does not work well when the dimension of the data is greater than two. Here, we conduct a more thorough comparison of 12 internal indices and provide a summary of the experimental performance of different indices. Furthermore, we introduce the sum-of-squares based indices into automatic keyword categorization, where the indices are specially defined for determining the number of clusters. © 2014 Elsevier B.V. All rights reserved.
منابع مشابه
Sum-of-Squares Based Cluster Validity Index and Significance Analysis
Different clustering algorithms achieve different results to certain data sets because most clustering algorithms are sensitive to the input parameters and the structure of data sets. Cluster validity, as the way of evaluating the result of the clustering algorithms, is one of the problems in cluster analysis. In this paper, we build up a framework for cluster validity process, meanwhile a sum-...
متن کاملThe Extremal Graphs for (Sum-) Balaban Index of Spiro and Polyphenyl Hexagonal Chains
As highly discriminant distance-based topological indices, the Balaban index and the sum-Balaban index of a graph $G$ are defined as $J(G)=frac{m}{mu+1}sumlimits_{uvin E} frac{1}{sqrt{D_{G}(u)D_{G}(v)}}$ and $SJ(G)=frac{m}{mu+1}sumlimits_{uvin E} frac{1}{sqrt{D_{G}(u)+D_{G}(v)}}$, respectively, where $D_{G}(u)=sumlimits_{vin V}d(u,v)$ is the distance sum of vertex $u$ in $G$, $m$ is the n...
متن کاملValidation and Localization of the Persian Version of Short form the Index of Ability and Readiness of Performing the Mission in Military Nurses
Background and Aim: Nursing is an important subset of the health care system to act in critical situations. Military and civilian nurses are among the first to appear on the scene and provide services in the event of an accident or disaster, and military nurses play a double role in times of crisis due to their special security dimension. Assessing the capability and readiness of military nurse...
متن کاملQuantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions
Clustering techniques are widely used in “Web Usage Mining” to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are...
متن کاملComparison of Topological Indices Based on Iterated ‘Sum’ versus ‘Product’ Operations
The Padmakar-Ivan (PI) index is a first-generation topological index (TI) based on sums over all edges between numbers of edges closer to one endpoint and numbers of edges closer to the other endpoint. Edges at equal distances from the two endpoints are ignored. An analogous definition is valid for the Wiener index W, with the difference that sums are replaced by products. A few other TIs are d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Data Knowl. Eng.
دوره 92 شماره
صفحات -
تاریخ انتشار 2014